Granada Province
The Infinite-Dimensional Nature of Spectroscopy and Why Models Succeed, Fail, and Mislead
Michelucci, Umberto, Venturini, Francesca
Machine learning (ML) models have achieved strikingly high accuracies in spectroscopic classification tasks, often without a clear proof that those models used chemically meaningful features. Existing studies have linked these results to data preprocessing choices, noise sensitivity, and model complexity, but no unifying explanation is available so far. In this work, we show that these phenomena arise naturally from the intrinsic high dimensionality of spectral data. Using a theoretical analysis grounded in the Feldman-Hajek theorem and the concentration of measure, we show that even infinitesimal distributional differences, caused by noise, normalisation, or instrumental artefacts, may become perfectly separable in high-dimensional spaces. Through a series of specific experiments on synthetic and real fluorescence spectra, we illustrate how models can achieve near-perfect accuracy even when chemical distinctions are absent, and why feature-importance maps may highlight spectrally irrelevant regions. We provide a rigorous theoretical framework, confirm the effect experimentally, and conclude with practical recommendations for building and interpreting ML models in spectroscopy.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
- Europe > Switzerland (0.05)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- (8 more...)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- (25 more...)
- Health & Medicine (0.46)
- Transportation (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- (13 more...)
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- (13 more...)
- Europe > Germany > Berlin (0.05)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > United Kingdom > Scotland > City of Glasgow > Glasgow (0.04)
- (5 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
- Information Technology > Security & Privacy (0.68)